我们提出了HRF-NET,这是一种基于整体辐射场的新型视图合成方法,该方法使用一组稀疏输入来呈现新视图。最近的概括视图合成方法还利用了光辉场,但渲染速度不是实时的。现有的方法可以有效地训练和呈现新颖的观点,但它们无法概括地看不到场景。我们的方法解决了用于概括视图合成的实时渲染问题,并由两个主要阶段组成:整体辐射场预测指标和基于卷积的神经渲染器。该架构不仅基于隐式神经场的一致场景几何形状,而且还可以使用单个GPU有效地呈现新视图。我们首先在DTU数据集的多个3D场景上训练HRF-NET,并且网络只能仅使用光度损耗就看不见的真实和合成数据产生合理的新视图。此外,我们的方法可以利用单个场景的密集参考图像集来产生准确的新颖视图,而无需依赖其他明确表示,并且仍然保持了预训练模型的高速渲染。实验结果表明,HRF-NET优于各种合成和真实数据集的最先进的神经渲染方法。
translated by 谷歌翻译
Video understanding is a growing field and a subject of intense research, which includes many interesting tasks to understanding both spatial and temporal information, e.g., action detection, action recognition, video captioning, video retrieval. One of the most challenging problems in video understanding is dealing with feature extraction, i.e. extract contextual visual representation from given untrimmed video due to the long and complicated temporal structure of unconstrained videos. Different from existing approaches, which apply a pre-trained backbone network as a black-box to extract visual representation, our approach aims to extract the most contextual information with an explainable mechanism. As we observed, humans typically perceive a video through the interactions between three main factors, i.e., the actors, the relevant objects, and the surrounding environment. Therefore, it is very crucial to design a contextual explainable video representation extraction that can capture each of such factors and model the relationships between them. In this paper, we discuss approaches, that incorporate the human perception process into modeling actors, objects, and the environment. We choose video paragraph captioning and temporal action detection to illustrate the effectiveness of human perception based-contextual representation in video understanding. Source code is publicly available at https://github.com/UARK-AICV/Video_Representation.
translated by 谷歌翻译
The introduction of high-quality image generation models, particularly the StyleGAN family, provides a powerful tool to synthesize and manipulate images. However, existing models are built upon high-quality (HQ) data as desired outputs, making them unfit for in-the-wild low-quality (LQ) images, which are common inputs for manipulation. In this work, we bridge this gap by proposing a novel GAN structure that allows for generating images with controllable quality. The network can synthesize various image degradation and restore the sharp image via a quality control code. Our proposed QC-StyleGAN can directly edit LQ images without altering their quality by applying GAN inversion and manipulation techniques. It also provides for free an image restoration solution that can handle various degradations, including noise, blur, compression artifacts, and their mixtures. Finally, we demonstrate numerous other applications such as image degradation synthesis, transfer, and interpolation.
translated by 谷歌翻译
Predictive simulations of the shock-to-detonation transition (SDT) in heterogeneous energetic materials (EM) are vital to the design and control of their energy release and sensitivity. Due to the complexity of the thermo-mechanics of EM during the SDT, both macro-scale response and sub-grid mesoscale energy localization must be captured accurately. This work proposes an efficient and accurate multiscale framework for SDT simulations of EM. We employ deep learning to model the mesoscale energy localization of shock-initiated EM microstructures upon which prediction results are used to supply reaction progress rate information to the macroscale SDT simulation. The proposed multiscale modeling framework is divided into two stages. First, a physics-aware recurrent convolutional neural network (PARC) is used to model the mesoscale energy localization of shock-initiated heterogeneous EM microstructures. PARC is trained using direct numerical simulations (DNS) of hotspot ignition and growth within microstructures of pressed HMX material subjected to different input shock strengths. After training, PARC is employed to supply hotspot ignition and growth rates for macroscale SDT simulations. We show that PARC can play the role of a surrogate model in a multiscale simulation framework, while drastically reducing the computation cost and providing improved representations of the sub-grid physics. The proposed multiscale modeling approach will provide a new tool for material scientists in designing high-performance and safer energetic materials.
translated by 谷歌翻译
尽管像主组件分析一样,经典缩放是无参数的,但大多数用于嵌入多元数据的方法都需要选择一个或几个参数。由于情况的无监督性,这种调整可能很困难。我们提出了一种简单,几乎明显的方法来监督调整参数的选择:最大程度地减少压力的概念。我们通过参考刚性理论来证实这种选择。我们扩展了Aspnes等人的结果。 (IEEE移动计算,2006年),表明一般的随机几何图形是具有很高概率的三材料图。我们提供了稳定结果\ a la anderson等。 (SIAM离散数学,2010年)。我们在Shang和Ruml的MDS-MAP(P)算法的背景下说明了这种方法(IEEE Infocom,2004)。作为一种典型的补丁方法,它需要选择补丁大小,我们使用压力来使该选择数据驱动。在这种情况下,我们执行许多实验来说明使用应力作为调整参数选择的基础的有效性。这样一来,我们揭示了一个偏见差异的权衡,这是一种现象,在多维缩放文献中可能被忽略了。通过将MDS-MAP(P)变成一种流形学习方法,我们获得了ISOMAP的局部版本,为此,应力最小化也可以用于参数调整。
translated by 谷歌翻译
卷积神经网络(Convnets或CNNS)已被坦率地部署在计算机视觉和相关领域的范围中。然而,这些神经网络的训练动态仍然难以捉摸:训练它们很难且计算昂贵。已经提出了无数的架构和培训策略来克服这一挑战,并解决了图像处理中的几个问题,例如语音,图像和动作识别以及对象检测。在本文中,我们提出了一种基于粒子群优化(PSO)的新型训练。在这样的框架中,每个转弯的权重向量通常被铸成一个粒子在相空间中的位置,从而使PSO协作动力学与随机梯度下降(SGD)交织在一起,以提高训练性能和泛化。我们的方法如下:i)[常规阶段]每个Convnet都通过SGD独立训练; ii)[协作阶段] convnets在当前的权重(或粒子位置)及其对损耗函数的梯度估计中共享。不同的台阶尺寸由不同的convnet创造。通过将较大(可能是随机)的阶梯尺寸以及更保守的阶梯尺寸正确混合,我们提出了一种具有竞争性能的算法,相对于CIFAR-10的其他基于PSO的方法(精度为98.31%)。这些准确性水平是通过仅诉诸四个Convnet来获得的 - 预计此类结果将随着协作交流的数量而扩展。我们使我们的源代码可用于下载https://github.com/leonlha/pso-convnet-dynamics。
translated by 谷歌翻译
问题回答(QA)是信息检索和信息提取领域内的一项自然理解任务,由于基于机器阅读理解的模型的强劲发展,近年来,近年来,近年来的计算语言学和人工智能研究社区引起了很多关注。基于读者的质量检查系统是一种高级搜索引擎,可以使用机器阅读理解(MRC)技术在开放域或特定领域特定文本中找到正确的查询或问题的答案。 MRC和QA系统中的数据资源和机器学习方法的大多数进步尤其是在两种资源丰富的语言中显着开发的,例如英语和中文。像越南人这样的低资源语言见证了关于质量检查系统的稀缺研究。本文介绍了XLMRQA,这是第一个在基于Wikipedia的文本知识源(使用UIT-Viquad语料库)上使用基于变压器的读取器的越南质量检查系统,使用深​​层神经网络模型优于DRQA和BERTSERINI,优于两个可靠的QA系统分别为24.46%和6.28%。从三个系统获得的结果中,我们分析了问题类型对质量检查系统性能的影响。
translated by 谷歌翻译
用于运动中的人类的新型视图综合是一个具有挑战性的计算机视觉问题,使得诸如自由视视频之类的应用。现有方法通常使用具有多个输入视图,3D监控或预训练模型的复杂设置,这些模型不会概括为新标识。旨在解决这些限制,我们提出了一种新颖的视图综合框架,以从单视图传感器捕获的任何人的看法生成现实渲染,其具有稀疏的RGB-D,类似于低成本深度摄像头,而没有参与者特定的楷模。我们提出了一种架构来学习由基于球体的神经渲染获得的小说视图中的密集功能,并使用全局上下文修复模型创建完整的渲染。此外,增强剂网络利用了整体保真度,即使在原始视图中的遮挡区域中也能够产生细节的清晰渲染。我们展示了我们的方法为单个稀疏RGB-D输入产生高质量的合成和真实人体演员的新颖视图。它概括了看不见的身份,新的姿势,忠实地重建面部表情。我们的方法优于现有人体观测合成方法,并且对不同水平的输入稀疏性具有稳健性。
translated by 谷歌翻译